Analyses of swisscom data
Grid extras
Swisscom grid coordinates & IDs
Tile definitions were pulled from API using
query_swisscom_heatmaps_api.py.
read_fun <- function(filename) {
data <- jsonlite::fromJSON(filename)
data <- jsonlite::flatten(data$tiles) %>%
dplyr::as_tibble()
data$plz <- gsub("grid_|.json", "", filename)
data$plz <- gsub("data/swisscom/", "", data$plz)
return( data )
}
doFuture::registerDoFuture()
future::plan("multisession", workers = 8)
grid <- plyr::ldply(.data = fs::dir_ls("data/swisscom/",
regexp = "[0-9][.]json$"),
.fun = read_fun,
.id = NULL,
.parallel = TRUE) %>%
as_tibble() %>%
distinct()Focusing on test area of Bern city centre, including postal codes:
x <character>
# total N=6965 valid N=6965 mean=3044.06 sd=37.39
Value | N | Raw % | Valid % | Cum. %
---------------------------------------
3005 | 198 | 2.84 | 2.84 | 2.84
3006 | 604 | 8.67 | 8.67 | 11.51
3007 | 254 | 3.65 | 3.65 | 15.16
3008 | 445 | 6.39 | 6.39 | 21.55
3010 | 28 | 0.40 | 0.40 | 21.95
3011 | 138 | 1.98 | 1.98 | 23.93
3012 | 581 | 8.34 | 8.34 | 32.28
3013 | 176 | 2.53 | 2.53 | 34.80
3014 | 366 | 5.25 | 5.25 | 40.06
3018 | 590 | 8.47 | 8.47 | 48.53
3027 | 720 | 10.34 | 10.34 | 58.87
3073 | 509 | 7.31 | 7.31 | 66.17
3074 | 389 | 5.59 | 5.59 | 71.76
3084 | 526 | 7.55 | 7.55 | 79.31
3095 | 152 | 2.18 | 2.18 | 81.49
3097 | 182 | 2.61 | 2.61 | 84.11
3098 | 1107 | 15.89 | 15.89 | 100.00
<NA> | 0 | 0.00 | <NA> | <NA>
Points of grid were defined using lower left corner coordinates. They were also shifted by 50m east and north to better align with grids.
grid_sf <- grid %>%
st_as_sf(coords = c("ll.x", "ll.y"),
crs = 4326,
remove = TRUE) %>%
st_transform(21781) %>%
mutate(x = st_coordinates(.)[, 1],
y = st_coordinates(.)[, 2]) %>%
select(-ur.x, -ur.y)
# shifting by 50m to the centre
grid_sf_50 <- grid_sf %>%
st_drop_geometry() %>%
mutate(x = as.integer(as.integer(x) + 51), # why on earth 1?
y = as.integer(as.integer(y) + 50)) %>%
st_as_sf(coords = c("x", "y"),
crs = 21781,
remove = FALSE) Grid derived with swisscom offset
swisscom points were linked to country grid derived in file
01.Rmd providing access to crucial tile ID
variable needed to link to the Heatmap API outputs.
bern_plz <-
read_rds("data/grid/country.Rds") %>%
st_join(grid_sf_50,
left = FALSE)
write_rds(bern_plz, "data/grid/bern_plz.Rds")Study area coverage
Duplicate cells
There are some cells in the grid that are duplicated because they overlap two (or more?) PLZs and were returned twice.
x <lgl>
# total N=6965 valid N=6965 mean=0.11 sd=0.31
Value | N | Raw % | Valid % | Cum. %
---------------------------------------
FALSE | 6232 | 89.48 | 89.48 | 89.48
TRUE | 733 | 10.52 | 10.52 | 100.00
<NA> | 0 | 0.00 | <NA> | <NA>
Example:
The do have unique ID so can easily be excluded in order to create correct visualizations (see #8). However analyses that would be based on PLZs, particularly aggregation of data would have to determine correct assignment of grid cells to PLZs. Perhaps by using (pop weighted?) centroid or sth similar?